Latent Dirichlet Allocation Model Training With Differential Privacy

نویسندگان

چکیده

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as fundamental tool analysis in various applications. However, the LDA model well training process may expose information data, thus bringing significant privacy concerns. To address issue LDA, we systematically investigate protection main-stream algorithm based on Collapsed Gibbs Sampling (CGS) propose several differentially private algorithms typical scenarios. In particular, present first theoretical inherent differential guarantee CGS further centralized privacy-preserving (HDP-LDA) that can prevent inference from intermediate statistics training. Also, locally (LP-LDA) crowdsourced to provide local individual contributors. Furthermore, extend LP-LDA an online version OLP-LDA achieve mini-batches streaming setting. Extensive experiment results validate both effectiveness efficiency our proposed algorithms.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Latent Dirichlet Allocation

Latent Dirichlet Allocation is a generative topic model for text. In this report, we implement collapsed Gibbs sampling to learn the topic model. We test our implementation on two data sets: classic400 and Psychological Abstract Review. We also discuss the different evaluation of goodness-of-fit of the models how parameter settings interact with the goodness-of-fit.

متن کامل

Spatial Latent Dirichlet Allocation

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” an...

متن کامل

Latent Dirichlet Allocation

We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model , also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where t...

متن کامل

Latent Dirichlet Allocation

Latent Dirichlet allocation(LDA) is a generative topic model to find latent topics in a text corpus. It can be trained via collapsed Gibbs sampling. In this project, we train LDA models on two datasets, Classic400 and BBCSport dataset. We discuss possible ways to evaluate goodness-of-fit and to detect overfitting problem of LDA model, and we use these criteria to choose proper hyperparameters, ...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Forensics and Security

سال: 2021

ISSN: ['1556-6013', '1556-6021']

DOI: https://doi.org/10.1109/tifs.2020.3032021